Automated labeling in document images
نویسندگان
چکیده
The National Library of Medicine (NLM) is developing an automated system to produce bibliographic records for its MEDLINE database. This system, named Medical Article Record System (MARS), employs document image analysis and understanding techniques and optical character recognition (OCR). This paper describes a key module in MARS called the Automated Labeling (AL) module, which labels all zones of interest (title, author, affiliation, and abstract) automatically. The AL algorithm is based on 120 rules that are derived from an analysis of journal page layouts and features extracted from OCR output. Experiments carried out on more than 11,000 articles in over 1,000 biomedical journals show the accuracy of this rule-based algorithm to exceed 96%. Keyword: OCR, automated data entry, automated zoning, automated labeling, rule-based algorithm, MARS, NLM
منابع مشابه
Automated Document Labeling Using Integrated Image and Neural Processing
As part of our effort to develop an automated data entry system to identify and convert bibliographic information from paper-based documents to electronic format for inclusion in the MEDLINE database used worldwide by biomedical researchers and clinicians, we have implemented a new technique for automatically labeling zones from scanned images with meaningful labels such as title, author, affi...
متن کاملFace Detection with methods based on color by using Artificial Neural Network
The face Detection methodsis used in order to provide security. The mentioned methods problems are that it cannot be categorized because of the great differences and varieties in the face of individuals. In this paper, face Detection methods has been presented for overcoming upon these problems based on skin color datum. The researcher gathered a face database of 30 individuals consisting of ov...
متن کاملA Semi-Automated Algorithm for Segmentation of the Left Atrial Appendage Landing Zone: Application in Left Atrial Appendage Occlusion Procedures
Background: Mechanical occlusion of the Left atrial appendage (LAA) using a purpose-built device has emerged as an effective prophylactic treatment in patients with atrial fibrillation at risk of stroke and a contraindication for anticoagulation. A crucial step in procedural planning is the choice of the device size. This is currently based on the manual analysis of the “Device Landing Zone” fr...
متن کاملAutomated classification of pulmonary nodules through a retrospective analysis of conventional CT and two-phase PET images in patients undergoing biopsy
Objective(s): Positron emission tomography/computed tomography (PET/CT) examination is commonly used for the evaluation of pulmonary nodules since it provides both anatomical and functional information. However, given the dependence of this evaluation on physician’s subjective judgment, the results could be variable. The purpose of this study was to develop an automated scheme for the classific...
متن کاملNovel Automated Method for Minirhizotron Image Analysis: Root Detection using Curvelet Transform
In this article a new method is introduced for distinguishing roots and background based on their digital curvelet transform in minirhizotron images. In the proposed method, the nonlinear mapping is applied on sub-band curvelet components followed by boundary detection using energy optimization concept. The curvelet transform has the excellent capability in detecting roots with different orient...
متن کامل